Decision trees are PAC-learnable from most product distributions: a smoothed analysis

نویسندگان

  • Adam Tauman Kalai
  • Shang-Hua Teng
چکیده

We consider the problem of PAC-learning decision trees, i.e., learning a decision tree over the n-dimensional hypercube from independent random labeled examples. Despite significant effort, no polynomial-time algorithm is known for learning polynomial-sized decision trees (even trees of any super-constant size), even when examples are assumed to be drawn from the uniform distribution on {0,1}. We give an algorithm that learns arbitrary polynomial-sized decision trees for most product distributions. In particular, consider a random product distribution where the bias of each bit is chosen independently and uniformly from, say, [.49, .51]. Then with high probability over the parameters of the product distribution and the random examples drawn from it, the algorithm will learn any tree. More generally, in the spirit of smoothed analysis, we consider an arbitrary product distribution whose parameters are specified only up to a [−c, c] accuracy (perturbation), for an arbitrarily small positive constant c.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Using Local Membership Queries

We introduce a new model of membership query (MQ) learning, where the learning algorithm is restricted to query points that are close to random examples drawn from the underlying distribution. The learning model is intermediate between the PAC model (Valiant, 1984) and the PAC+MQ model (where the queries are allowed to be arbitrary points). Membership query algorithms are not popular among mach...

متن کامل

Learning using Local Membership Queries under Smooth Distributions

We introduce a new model of membership query (MQ) learning, where the learning algorithm is restricted to query points that are close to random examples drawn from the underlying distribution. The learning model is intermediate between the PAC model (Valiant, 1984) and the PAC+MQ model (where the queries are allowed to be arbitrary points). Membership query algorithms are not popular among mach...

متن کامل

On the Proper Learning of Axis Parallel Concepts

We study the proper learnability of axis-parallel concept classes in the PAC-learning and exactlearning models. These classes include union of boxes, DNF, decision trees and multivariate polynomials. For constant-dimensional axis-parallel concepts C we show that the following problems have time complexities that are within a polynomial factor of each other. 1. C is α-properly exactly learnable ...

متن کامل

Rank-r Decision Trees are a Subclass of r-Decision Lists

Rivcst [5] defines the notion of a decision list as a representation for Boolean functions. He shows that k-decision lists, a generalization of k-CNF and k-DNF formulas, are learnable for constant k in the PAC (or distribution-free) learning model [&,3]. Ehrenfcucht and Haussler [l] define the notion of the rank of a decision tree, and prove that decision trees of constant rank are also learnab...

متن کامل

On Using Extended Statistical Queries to Avoid Membership Queries

The Kushilevitz-Mansour (KM) algorithm is an algorithm that finds all the “large” Fourier coefficients of a Boolean function. It is the main tool for learning decision trees and DNF expressions in the PAC model with respect to the uniform distribution. The algorithm requires access to the membership query (MQ) oracle. The access is often unavailable in learning applications and thus the KM algo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0812.0933  شماره 

صفحات  -

تاریخ انتشار 2008